Skip to content

A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396

Open
mystic-qaq wants to merge 9 commits into
deepmodeling:developfrom
mystic-qaq:feat/unblock
Open

A simple try assisted with GPT : nonblocking gather/scatter exchanges#7396
mystic-qaq wants to merge 9 commits into
deepmodeling:developfrom
mystic-qaq:feat/unblock

Conversation

@mystic-qaq
Copy link
Copy Markdown

@mohanchen mohanchen added Refactor Refactor ABACUS codes Tests/Examples Issues/PR related to unit tests and integrate tests labels May 29, 2026
@Cstandardlib Cstandardlib requested a review from Copilot May 29, 2026 14:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors PW_Basis::gatherp_scatters and PW_Basis::gathers_scatterp from blocking MPI_Alltoallv to non-blocking MPI_Irecv/MPI_Isend exchanges with overlapping pack/unpack work, adds a per-instance reusable communication workspace, and introduces a round-trip unit test.

Changes:

  • Replace MPI_Alltoallv with non-blocking sends/receives plus MPI_Waitsome-driven unpack overlap in both gather/scatter directions, separating send/receive into distinct workspace slices.
  • Add acquire_comm_workbuf<T>() returning per-instance mutable std::vector storage (float and double specializations) and add fine-grained timer regions.
  • Add test_comm_roundtrip.cpp (round-trip equality and a zero-plane "stress" layout sweep) and register it in the test CMakeLists.txt.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
source/source_basis/module_pw/pw_gatherscatter.h Rewrites both routines to use Irecv/Isend with manual self-copy, dedicated send/recv workspace, and overlapped unpack via MPI_Waitsome.
source/source_basis/module_pw/pw_basis.h Declares acquire_comm_workbuf plus mutable per-instance buffers; adds <vector> include.
source/source_basis/module_pw/test/test_comm_roundtrip.cpp New round-trip tests using a friend accessor subclass to call the protected gather/scatter methods.
source/source_basis/module_pw/test/CMakeLists.txt Registers the new test source file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mohanchen mohanchen added project_learning and removed Refactor Refactor ABACUS codes labels May 29, 2026
std::string precision = "double"; ///< single, double, mixing
bool double_data_ = true; ///< if has double data
bool float_data_ = false; ///< if has float data
mutable std::vector<std::complex<float>> comm_workbuf_float_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not recommended to use mutable keyword. It breaks const semantics, hides state changes
and brings potential thread-safety risks. Use it only as a last resort.

mystic-qaq and others added 6 commits May 31, 2026 16:21
…rk buffers

Remove the mutable keyword from comm_workbuf_float_ and
comm_workbuf_double_ by switching from std::vector (which returns
const T* from const data()) to std::unique_ptr<T[]> (whose get()
returns T* from const method).

Key changes:
- Pre-allocate work buffers in allocate_comm_buffers() called from
  getstartgr(), using the already-computed numr/startr/numg/startg
  arrays to determine the maximum required buffer size
- acquire_comm_workbuf<T>() no longer resizes lazily; it returns the
  pre-allocated buffer via unique_ptr::get() with an assertion guard
- Add cleanup in destructor via unique_ptr::reset()

Rationale: unique_ptr::get() is a const method that returns a
non-const T*, matching the semantic intent — a const PW_Basis
does not re-seat the buffer pointer, but the pointed-to scratch
memory remains mutable for MPI write operations. This avoids the
thread-safety concerns of mutable while maintaining const-correctness
throughout the gather/scatter call chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Include standalone microbenchmark (bench_comm.cpp) comparing blocking
vs nonblocking MPI gather/scatter, and PR_DESCRIPTION.md with design
rationale and performance validation results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the simplified microbenchmark with a benchmark that directly
calls PW_Basis::gatherp_scatters()/gathers_scatterp() (feat/unblock)
and compares against the exact blocking implementations from the
develop branch. Uses realistic ABACUS parameters (10A cell, ecut=100Ry,
64^3 FFT grid).

Key results: nonblocking is 1.06x-1.45x faster at 3+ MPI ranks,
with maximum speedup of 1.45x at 4 ranks with 2 OpenMP threads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

project_learning Tests/Examples Issues/PR related to unit tests and integrate tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants